-
Notifications
You must be signed in to change notification settings - Fork 4
Add prompt optimizers to LiSSA #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit introduces a prompt optimization system that enables automatic tuning of classification prompts to improve trace link recovery performance. Core Components: - PromptOptimizer interface and IterativeOptimizer implementation for iterative naive prompt refinement - Optimization class as the main orchestrator for running optimization workflows - OptimizeCommand CLI integration for executing optimizations from command line Evaluation Framework: - Evaluator base class with strategy pattern for different evaluation approaches - BruteForceEvaluator for exhaustive prompt candidate evaluation - UpperConfidenceBoundBanditEvaluator for efficient exploration-exploitation tradeoff using UCB1 algorithm to reduce computational costs - MockEvaluator for testing purposes Metrics System: - Metric interface with PointwiseMetric and GlobalMetric implementations - FBetaMetric for computing precision, recall, and F-beta scores - Scorer interface with BinaryScorer for trace link classification scoring - Reductor interface with MeanReductor for aggregating pointwise metrics - Factory classes (MetricFactory, ScorerFactory, ReductorFactory) for instantiation Infrastructure Updates: - Extended ModuleConfiguration with prompt management capabilities - Added OptimizerConfiguration for optimizer-specific settings - Enhanced CacheManager with scorer-specific caching via ScorerCacheKey - Added ChatLanguageModelUtils for LLM interaction helpers - Updated Statistics class with methods for computing evaluation metrics Classifier Modifications: - Used setClassificationPrompt() and getClassificationPromptKey() to Classifier abstract class to support dynamic prompt updates - Replaced cache parameters with map Testing: - Added PromptOptimizationUtilsTest with Markdown escaping tests - Extended Requirement2RequirementE2ETest with optimizer test cases - Added test resources with expected outputs and cache files for iterative and simple optimizer configurations - Updated ArchitectureTest to ensure correct usage of dangerous overwrite methods Configuration: - Added example-configs/optimizer-config.json demonstrating optimizer setup
|
@DanielDango I've opened a new pull request, #45, to work on those changes. Once the pull request is ready, I'll request review from you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a comprehensive prompt optimization framework to the LiSSA trace link analysis system. The framework enables iterative improvement of classification prompts through automated optimization strategies, integrating metrics, evaluators, and optimizer implementations.
Key Changes:
- Adds prompt optimizer framework with iterative optimization support
- Introduces metric system (pointwise and global metrics) for evaluating prompt performance
- Implements evaluator strategies (brute-force and Upper Confidence Bound bandit algorithms)
- Extends configuration system to support optimization workflows
Reviewed changes
Copilot reviewed 52 out of 54 changed files in this pull request and generated 28 comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/* | Core optimizer interfaces and implementations (IterativeOptimizer, MockOptimizer, OptimizerFactory, utilities) |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/* | Metric system with scorers (BinaryScorer), reductors (MeanReductor), and metric implementations (PointwiseMetric, GlobalMetric, FBetaMetric) |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/* | Evaluator base class with BruteForceEvaluator and UpperConfidenceBoundBanditEvaluator implementations |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Optimization.java | New optimization pipeline that extends evaluation with prompt optimization step |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/Evaluation.java | Extended to support prompt overwriting for optimization workflow |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/configuration/* | OptimizerConfiguration record and ModuleConfiguration setArgument methods |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/* | Added getCacheParameters() method and changed copyOf() visibility to public |
| src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/* | New ScorerCacheKey and CacheManager enhancements for optimizer caching |
| src/main/java/edu/kit/kastel/sdq/lissa/cli/command/OptimizeCommand.java | CLI command for running optimization and optional evaluation pipelines |
| src/test/java/edu/kit/kastel/sdq/lissa/ratlr/* | Test files including PromptOptimizationUtilsTest and E2E optimizer tests |
| src/test/resources/warc/* | Test configurations and expected outputs for optimizer validation |
| example-configs/optimizer-config.json | Example optimizer configuration file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeOptimizer.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/EvaluationResult.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/EvaluationResult.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 52 out of 54 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
Outdated
Show resolved
Hide resolved
src/test/java/edu/kit/kastel/sdq/lissa/ratlr/ArchitectureTest.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/GlobalMetric.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/GlobalMetric.java
Outdated
Show resolved
Hide resolved
src/test/java/edu/kit/kastel/sdq/lissa/ratlr/e2e/Requirement2RequirementE2ETest.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
…ented by @copilot on #44
- refines iterative optimizer with feedback on misclassified trace links (false positive or false negative) - add sample strategy to configure which TLs will be used as feedback
- add SLF4J Simple Logger configuration for improved logging in tests
…g and Simple Classifiers
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… feature/add-prompt-optimization-module # Conflicts: # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ChatLanguageModelProvider.java
|
@DanielDango I've opened a new pull request, #49, to work on those changes. Once the pull request is ready, I'll request review from you. |
…tive optimization prompt - update e2e test cases - add missing spaces in optimization prompt - revert "revert reduced target store deduplication" This reverts commit 63df5fb.
…optimization-module
6bef3cb to
a14496b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 86 out of 89 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptoptimizer/IterativeFeedbackOptimizer.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/MockClassifier.java
Outdated
Show resolved
Hide resolved
| * @param config The configuration object | ||
| * @throws IOException If there are issues setting up the cache | ||
| */ | ||
| public Evaluation(Configuration config) throws IOException { | ||
| this.configuration = config; | ||
| // TODO maybe dont? |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment states "TODO maybe dont?" regarding setting configFile to null. This suggests uncertainty about the design. Null config files could cause NullPointerExceptions downstream. Either handle this case properly or document why null is acceptable here.
| * @param config The configuration object | |
| * @throws IOException If there are issues setting up the cache | |
| */ | |
| public Evaluation(Configuration config) throws IOException { | |
| this.configuration = config; | |
| // TODO maybe dont? | |
| * <p> | |
| * Note: When using this constructor there is no associated configuration file on disk. | |
| * Consequently, {@link #configFile} is set to {@code null} by design and any code accessing | |
| * it must first check for {@code null}. | |
| * </p> | |
| * | |
| * @param config The configuration object | |
| * @throws IOException If there are issues setting up the cache | |
| */ | |
| public Evaluation(Configuration config) throws IOException { | |
| this.configuration = config; | |
| // No configuration file is associated with this instance; configFile remains null by design. |
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/evaluator/UpperConfidenceBoundBanditEvaluator.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/PointwiseMetric.java
Show resolved
Hide resolved
|
@DanielDango I think now, the challenge is to merge after changes from #48. Afterward, I can review this part of the PR :) |
…optimization-module # Conflicts: # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheKey.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheManager.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/CacheParameter.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/classifier/ClassifierCacheKey.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/classifier/ClassifierCacheParameter.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/embedding/EmbeddingCacheKey.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/embedding/EmbeddingCacheParameter.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ReasoningClassifier.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/SimpleClassifier.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/embeddingcreator/CachedEmbeddingCreator.java # src/main/java/edu/kit/kastel/sdq/lissa/ratlr/preprocessor/SummarizePreprocessor.java # src/test/java/edu/kit/kastel/sdq/lissa/ratlr/ArchitectureTest.java
|
@dfuchss the cache update has been merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 74 out of 77 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @Override | ||
| public String getClassificationPromptKey() { | ||
| public static String getClassificationPromptKey() { | ||
| return PROMPT_TEMPLATE_KEY; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method getClassificationPromptKey() has been changed from an instance method to a static method in SimpleClassifier and ReasoningClassifier. However, the method is still being removed from the abstract Classifier class rather than being changed to static there. This creates an inconsistency where individual classifier implementations have static methods but there's no common static interface. Consider whether a static factory method in the base Classifier class would be more appropriate, as shown in the new createClassificationPromptKey() method.
| return PROMPT_TEMPLATE_KEY; | |
| return Classifier.createClassificationPromptKey(PROMPT_TEMPLATE_KEY); |
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/ClassificationResult.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/classifier/ClassifierCacheKey.java
Outdated
Show resolved
Hide resolved
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/cache/embedding/EmbeddingCacheKey.java
Outdated
Show resolved
Hide resolved
| // TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config | ||
| // for serialization. Maybe you can utilize ModuleConfiguration.with() instead? | ||
| if (!prompt.isEmpty()) { | ||
| configuration | ||
| .classifier() | ||
| .setArgument(Classifier.createClassificationPromptKey(configuration.classifier()), prompt); |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment indicates this is a known workaround that should be addressed. The hack modifies the configuration's classifier argument directly to inject the optimized prompt. Consider using ModuleConfiguration.with() or a cleaner approach to avoid mutating configuration state after initialization, which could lead to inconsistencies and make the code harder to maintain.
| // TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config | |
| // for serialization. Maybe you can utilize ModuleConfiguration.with() instead? | |
| if (!prompt.isEmpty()) { | |
| configuration | |
| .classifier() | |
| .setArgument(Classifier.createClassificationPromptKey(configuration.classifier()), prompt); | |
| // Update the classifier configuration immutably so the optimized prompt is stored for serialization | |
| if (!prompt.isEmpty()) { | |
| var promptKey = Classifier.createClassificationPromptKey(configuration.classifier()); | |
| var updatedClassifierConfig = configuration.classifier().withArgument(promptKey, prompt); | |
| configuration = configuration.withClassifier(updatedClassifierConfig); |
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/classifier/Classifier.java
Outdated
Show resolved
Hide resolved
…ation prompt without modifying the base configuration
dfuchss
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Nice work :)
| for (Path optimizationConfig : configsToOptimize) { | ||
| LOGGER.info("Invoking the optimization pipeline with '{}'", optimizationConfig); | ||
| String optimizedPrompt = ""; | ||
| try { | ||
| var optimization = new Optimization(optimizationConfig); | ||
| optimizedPrompt = optimization.run(); | ||
| } catch (IOException e) { | ||
| LOGGER.warn( | ||
| "Optimization configuration '{}' threw an exception: {} \n Maybe the file does not exist?", | ||
| optimizationConfig, | ||
| e.getMessage()); | ||
| } | ||
| for (Path evaluationConfig : configsToEvaluate) { | ||
| runEvaluation(evaluationConfig, optimizedPrompt); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to own method to match the description in the JavaDoc :)
| public final class ClassifierCacheKey implements CacheKey { | ||
| private final String model; | ||
| private final int seed; | ||
| private final double temperature; | ||
| private final LargeLanguageModelCacheMode mode; | ||
| private final String content; | ||
|
|
||
| @JsonIgnore | ||
| private final String localKey; | ||
| public record ClassifierCacheKey( | ||
| String model, | ||
| int seed, | ||
| double temperature, | ||
| LargeLanguageModelCacheMode mode, | ||
| String content, | ||
| @JsonIgnore String localKey) | ||
| implements CacheKey { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not revert the change of the classifiercacheKey
| * @return A new classifier cache key | ||
| */ | ||
| static ClassifierCacheKey of(ClassifierCacheParameter cacheParameter, String content) { | ||
| public static ClassifierCacheKey of(ClassifierCacheParameter cacheParameter, String content) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They shall not be public
| <dependency> | ||
| <groupId>org.junit.jupiter</groupId> | ||
| <artifactId>junit-jupiter-params</artifactId> | ||
| <version>${junit.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? That should be managed by the parent.
| <groupId>com.diffplug.spotless</groupId> | ||
| <artifactId>spotless-maven-plugin</artifactId> | ||
| <configuration combine.self="append"> | ||
| <lineEndings>UNIX</lineEndings> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I will move that to the parent.
| // TODO: careful, this is a hack to allow the optimization to overwrite the prompt and store it to the config | ||
| // for serialization. Maybe you can utilize ModuleConfiguration.with() instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, resove todo
| /** | ||
| * Separator used in configuration names. | ||
| */ | ||
| public static final String CONFIG_NAME_SEPARATOR = "_"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isnt that defined publicly in configuration ?
| public static final String FIRST_SAMPLER = "first"; | ||
| public static final String ORDERED_SAMPLER = "ordered"; | ||
| public static final String SHUFFLED_SAMPLER = "shuffled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strings used elswhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 72 out of 75 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/main/java/edu/kit/kastel/sdq/lissa/ratlr/promptmetric/GlobalMetric.java:1
- The logger variable name 'logger' is inconsistent with the naming convention used in other classes in the codebase, which use 'LOGGER' in uppercase. For consistency, this should be renamed to 'LOGGER'.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| public Evaluation(Configuration config) throws IOException { | ||
| this.configuration = config; | ||
| // TODO maybe dont? | ||
| this.configFile = null; |
Copilot
AI
Jan 22, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODO comment on line 152 suggests uncertainty about setting configFile to null. This should either be resolved with a proper implementation or the TODO should be removed if null is the intended behavior. Consider clarifying whether configFile is optional for evaluation instances created from Configuration objects.
|



This PR aims to introduce prompt optimizers developed during my bachelors thesis into the LiSSA framework.
They can be used as a standalone interaction with the framework, as well as chained with regular evaluation pipeline runs.
The ProTeGi optimizer is suggested in #57